From the map above, we can see that the drinking water index is lower in east part of the bay area than anywhere else in the bay area.

The leaflet() map above shows that the northern bay and San Jose area have a higher drinking water index. In other words, there are more violations or contaminants in these regions in the past.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   103.3   198.0   267.2   277.1   349.5   941.4       3

The Equity analysis above shows that there is a disproportionate number of white households experience high drinking water index. From 600 to 1000 index values, there is a significant higher proportion of white households compare to the total white household proportion. The result indicates that white households in the bay area have experienced some unfair drinking water contamination between 2005 to 2013. However, there is also a greater proportion of white race than any other race in the bay area. Note, the 1000-1100 data are NAs.

Continue to Income vs. Water Quality Index

    income == "Less than $10,000"~"0-10,000",
    income == "$10,000 to $14,999"~"10,000-15,000",
    income == "$15,000 to $19,999"~"15,000-20,000",
    income == "$20,000 to $24,999"~"20,000-25,000",
    income == "$25,000 to $29,999"~"25,000-30,000",
    income == "$30,000 to $34,999"~"30,000-35,000",
    income == "$35,000 to $39,999"~"35,000-40,000",
    income == "$40,000 to $44,999"~"40,000-45,000",
    income == "$45,000 to $49,999"~"45,000-50,000",
    income == "$50,000 to $59,999"~"50,000-60,000",
    income == "$60,000 to $74,999"~"60,000-75,000",
    income == "$75,000 to $99,999"~"75,000-100,000",
    income == "$100,000 to $124,999"~"100,000-125,000",
    income == "$125,000 to $149,999"~"125,000-150,000",
    income == "$150,000 to $199,999"~"150,000-200,000",
    income == "$200,000 or more"~">200,000"

factor(levels = rev(c(“income”, “$10,000 to $14,999”,“$100,000 to $124,999”,“$125,000 to $149,999”, “$15,000 to $19,999”,“$150,000 to $199,999”,“$20,000 to $24,999”,“$200,000 or more”,“$25,000 to $29,999”,“$30,000 to $34,999”,“$35,000 to $39,999”,“$40,000 to $44,999”,“$45,000 to $49,999”,“$50,000 to $59,999”,“$60,000 to $74,999”,“$75,000 to $99,999”,“Less than $10,000”)))

The equity analysis above does not reflect there is any inequity between incomes of households and Drinking Water Index.

Survey Regression on Income vs. Drinking Water Index

## 
## Call:
## lm(formula = `Drinking Water` ~ per_over75k, data = bay_water_income_lm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -191.92  -80.55    0.73   52.79  545.54 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   245.14      20.65  11.872   <2e-16 ***
## per_over75k   142.27      85.67   1.661   0.0978 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 121.6 on 314 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.008706,   Adjusted R-squared:  0.005549 
## F-statistic: 2.758 on 1 and 314 DF,  p-value: 0.09779

Based on the above regression analysis, the drinking water index will increase by 142.27 for every unit increase in percent households with income that is greater than 75,000. However, the p-value for this analysis is 0.0978 which is greater than 0.05. As a result, the percent households with income that is greater than 75,000 is not a good predictor for drinking water index. Also, the adjusted R-squared value is 0.005549 which means the variation in per_over75k explain 0.5549% of the variation in drinking water index.

Now try to factor in Race

## 
## Call:
## lm(formula = `Drinking Water` ~ perc_over75k + perc_white, data = bay_water_income_lm1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -233.35  -78.42   -3.41   63.92  516.78 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    274.94      19.23  14.301  < 2e-16 ***
## perc_over75k  -917.60     384.89  -2.384  0.01772 *  
## perc_white     189.32      63.02   3.004  0.00288 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 120.1 on 313 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0362, Adjusted R-squared:  0.03004 
## F-statistic: 5.879 on 2 and 313 DF,  p-value: 0.003117

With the addition of percentage of white households, the regression analysis produces p-values that are less than 0.05. The above analysis suggests that the drinking water index decrease by 917.6 for every percent increase in $75,000 income households. Decreasing the drinking water index represents less chemical contamination and violations in drinking water between 2005 to 2013. It also suggests that drinking water index increases by 189.32 for every percent increase in white proportion in the total population. The result coincides with the conclusion from the previous equity analysis between race and drinking water index. The adjusted R squared value is 0.03 which means that both race and income explains 3% of the variation in drinking water index. The result is similar to the celebrity lecture example. Two variables contribute to becoming celebrity. In this case, both race and income affect the variations in the drinking water index. Still, we can never be sure if the relationship between the variables are truth. There are many examples of two unrelated variables have apparent correlations. Therefore, we need to consider other possible variables that also affect the drinking water index.